Automatic Identification and Classification of Protein Domains

نویسندگان

  • Elon Portugaly
  • Nathan Linial
  • Michal Linial
چکیده

Motivation: Proteins are comprised of one or several domains. Such domains can be classified into families according to their biological function. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational tools for large-scale determination of protein domains and their boundaries. The present paper addresses the challenge of developing computational tools to identify protein domains and to classify them into their families. The eventual goal of our research is to automatically identify and classify correctly all protein domains. Results: Our method, called EVEREST, combines methodologies from the fields of finite metric spaces, machine learning and statistical modeling and achieves state of the art results. Our process begins by constructing a database of protein segments that emerge in an all vs. all pairwise sequence comparison. It then proceeds to cluster these segments, choosing the best clusters using machine learning techniques, and creating a statistical model for each of the these clusters. This procedure is then iterated: The aforementioned statistical models are used to scan all protein sequences, to recreate a segment database and to cluster them again. Performance tests show that EVEREST recovers 63% of Pfam families and 40% of SCOP families with high accuracy, and suggests new families with about 40% fidelity. EVEREST domains are frequently a combination of domains as defined by Pfam or SCOP and frequently subdomains of such domains. The paper is concluded with a discussion of research avenues to improve these

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Automatic Fingerprint Classification Algorithm

Manual fingerprint classification algorithms are very time consuming, and usually not accurate. Fast and accurate fingerprint classification is essential to each AFIS (Automatic Fingerprint Identification System). This paper investigates a fingerprint classification algorithm that reduces the complexity and costs associated with the fingerprint identification procedure. A new structural algorit...

متن کامل

An Automatic Fingerprint Classification Algorithm

Manual fingerprint classification algorithms are very time consuming, and usually not accurate. Fast and accurate fingerprint classification is essential to each AFIS (Automatic Fingerprint Identification System). This paper investigates a fingerprint classification algorithm that reduces the complexity and costs associated with the fingerprint identification procedure. A new structural algorit...

متن کامل

Automatic Identification and Classification of the Iranian Traditional Music Scales (Dastgāh) and Melody Models (Gusheh): Analytical and Comparative Review on Conducted Research

Background and Aim: Automatic identification and classification of the Iranian traditional music scales (Dastgāh) and melody models (Gusheh) has attracted the attention of the researchers for more than a decade. The current research aims to review conducted researches on this area and consider its different approached and obstacles. Method: The research approach is content analysis and data col...

متن کامل

Kohonen Self Organizing for Automatic Identification of Cartographic Objects

Automatic identification and localization of cartographic objects in aerial and satellite images have gained increasing attention in recent years in digital photogrammetry and remote sensing. Although the automatic extraction of man made objects in essence is still an unresolved issue, the man made objects can be extracted from aerial photos and satellite images. Recently, the high-resolution s...

متن کامل

Automatic classification of highly related Malate Dehydrogenase and L-Lactate Dehydrogenase based on 3D-pattern of active sites

Accurate protein function prediction is an important subject in bioinformatics, especially wheresequentially and structurally similar proteins have different functions. Malate dehydrogenaseand L-lactate dehydrogenase are two evolutionary related enzymes, which exist in a widevariety of organisms. These enzymes are sequentially and structurally similar and sharecommon active site residues, spati...

متن کامل

Object-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest

This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005